class: center, middle, inverse, title-slide # Lecture 15 ## Factorial Designs ### Psych 10 C ### University of California, Irvine ### 05/04/2022 --- ## Factorial Designs - Last class we worked with an example of a `\(2\times2\)` between subjects factorial design using the cell means approach. -- - Today we will work with a new example using R. -- - First we will introduce a new study. --- ## Depression example - Researchers are interested on the effects of a pharmacological treatment and therapy for the treatment of depression. -- - They have design a study with 100 participants divided in 4 groups. The first group did not receive any treatment (no pharmaceutical treatment and no therapy). -- - The second group was exposed only to therapy for 1 month. -- - The third group was only exposed to a pharmaceutical treatment for 1 month. -- - The last group was exposed to both, a pharmaceutical treatment and therapy for 1 month. -- - The dependent variable in the study was the difference between pre/post treatment scores on a depression inventory (scale). `$$y_{ijk} = \text{post}_{ijk} - \text{pre}_{ijk}$$` --- ## Depression example - This is a `\(2\times2\)` between subjects factorial design. The first factor is whether participants received pharmaceutical treatment or not. We denote with `\(j = 1\)` participants that did not receive pharmaceutical treatment. -- - The second factor is whether the participant was exposed to therapy or not. We denote with `\(k = 1\)` participants that did not receive therapy. -- - As we know, there are 5 models that we need to compare for this type of study. -- - The Null model assumes that the change in depression scores between pre/post treatment will be the same regardless of the group that a participant belongs to (combination of the levels of our factors). --- ## Depression example - A pharmaceutical treatment main effects model, which assumes that only pharmaceutical treatment has an effect on the change between pre/post treatment depression scores. -- - A therapy main effects model, which assumes that only therapy has an effect on the change between pre/post treatment depression scores. -- - An additive model, which assumes that both pharmaceutical treatment and therapy have an effect on the change between pre/post treatment depression scores. The effects according to this model are independent. -- - Finally, we have the full model which assumes that the effect of a pharmaceutical treatment on the change between pre/post treatment depression scores, depends on whether the participant was also exposed to therapy or not. --- ## Experimental data - The first thing that we need to do is look at the data. Because we have two independent variables (pharmaceutical treatment and therapy) a box-plot would be useful: -- .pull-left[ ```r ggplot(data = depression) + aes(x = therapy) + aes(y = beck_diff) + aes(color = pharma_treatment) + geom_boxplot(position = position_dodge(1), outlier.shape = NA) + geom_dotplot(data = depression, binaxis = 'y', stackdir='center', dotsize = 0.37, method = "histodot", position_dodge(1), alpha = 0.2, binwidth = 0.8, mapping = aes(x = therapy, y = beck_diff, color = pharma_treatment, fill = pharma_treatment), show.legend = FALSE) + theme_classic() + xlab("Therapy") + ylab("Depression scores (post - pre)") + guides(color = guide_legend("Pharma")) + theme(axis.title.x = element_text(size = 25), axis.title.y = element_text(size = 25), axis.text = element_text(size = 20)) ``` ] .pull-right[ <img src="data:image/png;base64,#lec-15_files/figure-html/hist-score-out-1.png" style="display: block; margin: auto;" /> ] --- ## Estimating the parameters for the full model - From the specification of the model, we know that the full model has 4 parameters that we need to estimate, one mean for each of the groups. -- - This can be done using the `group_by()` function, just that this time we have to give it the 2 variables that we want to group by, which indicate whether the participant received pharmaceutical treatment or not an whether they received therapy or not. -- ```r full <- depression %>% group_by(pharma_treatment, therapy) %>% summarise("pred" = mean(beck_diff)) ``` --
--- ## Factor level means - Like we have done so far, we can obtain the means by the levels of each factor using the `group_by()` function with a single variable. These values will allow us to obtain the main effects later. -- .pull-left[ ```r pharma <- depression %>% group_by(pharma_treatment) %>% summarise("average" = mean(beck_diff)) ``` ] .pull-right[ ```r therapy_mean <- depression %>% group_by(therapy) %>% summarise("average" = mean(beck_diff)) ``` ] -- .pull-left[
] .pull-right[
] --- ## Grand mean and effect parameters - Now we only need to estimate the grand mean and we will have everything that we need to compute the predictions of each model (and therefore the error). -- .pull-left[ ```r null <- depression %>% summarise("pred" = mean(beck_diff)) %>% pull(pred) ``` ] .pull-right[ - Grand mean = -0.91 ] -- - With the grand mean now we can compute the effects of each factor `\(\hat{\alpha}_j\)` for the pharmaceutical treatment and `\(\hat{\beta}_k\)` for the effect of therapy. -- .pull-left[ ```r alpha <- pharma$average - null beta <- therapy_mean$average - null ``` ] .pull-right[ | Parameter | Value | |:--------------------:|:---------------------:| | `\(\hat{\alpha}_{no}\)` | 0.5 | | `\(\hat{\alpha}_{yes}\)` | -0.5 | | `\(\hat{\beta}_{no}\)` | 0.75 | | `\(\hat{\beta}_{yes}\)` | -0.75 | ] --- ## Model Predictions - Now we can add the predictions and the error of each model to our data. -- - First we will add the predictions and errors of the null model. Remember that the prediction of the null model is just the grand mean. -- ```r depression <- depression %>% mutate("prediction_null" = null, "error_null" = (beck_diff - prediction_null)^2) ``` -- - Then we just need to add the values in the `error_null` column of our data to get the SSE. -- ```r sse_null <- sum(depression$error_null) ``` -- - The Sum of Squared Errors for the null model was equal to 399.8 --- ## Main effects predictions and errors - Now we will add the predictions of the pharmaceutical treatment main effects model, remember that the prediction of the model for each group are equal to: `$$\text{pahrma}_{no} = \hat{\mu} + \hat{\alpha}_{no} \quad \text{pahrma}_{yes} = \hat{\mu} + \hat{\alpha}_{yes}$$` -- ```r depression <- depression %>% mutate("prediction_pharma" = case_when(pharma_treatment == "no" ~ null + alpha[1], pharma_treatment == "yes" ~ null + alpha[2]), "error_pharma" = (beck_diff - prediction_pharma)^2) ``` -- - Again, we compute the Sum of Squared Errors of the model by adding all of the values in our new column `error_pharma`. -- ```r sse_pharma <- sum(depression$error_pharma) ``` -- - The SSE for the pharmacological treatment main effects model was 375.29 --- ## Main effects predictions and errors - Now we will add the predictions of the therapy main effects model, remember that the prediction of the model for each group are equal to: `$$\text{therapy}_{no} = \hat{\mu} + \hat{\beta}_{no} \quad \text{therapy}_{yes} = \hat{\mu} + \hat{\beta}_{yes}$$` -- ```r depression <- depression %>% mutate("prediction_therapy" = case_when(therapy == "no" ~ null + beta[1], therapy == "yes" ~ null + beta[2]), "error_therapy" = (beck_diff - prediction_therapy)^2) ``` -- - Again, we compute the Sum of Squared Errors of the model by adding all of the values in our new column `error_therapy`. -- ```r sse_therapy <- sum(depression$error_therapy) ``` -- - The SSE for the therapy main effects model was 344.19 --- ## Model predictions, additive - To add the additive model's predictions to the data we will need a logical test of the value of each factor. To do this we will use the `case_when()` function and the logical connector "&". -- ```r depression <- depression %>% mutate("prediction_additive" = case_when( pharma_treatment == "no" & therapy == "no" ~ null + alpha[1] + beta[1], pharma_treatment == "no" & therapy == "yes" ~ null + alpha[1] + beta[2], pharma_treatment == "yes" & therapy == "no" ~ null + alpha[2] + beta[1], pharma_treatment == "yes" & therapy == "yes" ~ null + alpha[2] + beta[2]), "error_additive" = (beck_diff - prediction_additive)^2) ``` -- - The only difference between the additive and previous models is how we add its predictions to the data, however, the error is computed the same way as before. To compute the SSE we only need to add the values in our new column `error_additive`. ```r sse_additive <- sum(depression$error_additive) ``` -- - The SSE for the additive model was 319.68 --- ## Model predictions, full - Given that the full model also makes a different prediction by combination of factor levels we will use the same structure of the `case_when()` function as before. -- ```r depression <- depression %>% mutate("prediction_full" = case_when( pharma_treatment == "no" & therapy == "no" ~ full$pred[1], pharma_treatment == "no" & therapy == "yes" ~ full$pred[2], pharma_treatment == "yes" & therapy == "no" ~ full$pred[3], pharma_treatment == "yes" & therapy == "yes" ~ full$pred[4]), "error_full" = (beck_diff - prediction_full)^2) ``` -- - Again we can obtain the SSE of the full model by adding all of the values on the new column we created `error_full` -- ```r sse_full <- sum(depression$error_full) ``` - The SSE for the additive model was 296.44 --- ## Model comparison - Now we have all the values that we need in order to compare the 5 models. -- - We will use a table to summarize the results for each model and then select the one that better accounts for the data -- - The first thing that we would like to present is the Mean Squared error, this can be calculated by dividing the SSE by the total sample size: -- ```r n_total <- nrow(depression) mse_null <- 1/n_total * sse_null mse_pharma <- 1/n_total * sse_pharma mse_therapy <- 1/n_total * sse_therapy mse_additive <- 1/n_total * sse_additive mse_full <- 1/n_total * sse_full ``` --- ## Calculating the `\(R^2\)` by model - With the SSE we can also calculate the value of `\(R^2\)` for each of the models. This is not something that we do all the time, normally we will just report the `\(R^2\)` of the model that we selected, however, it is good practice. -- - Remember that every model is compared to the Null in order to calculate the value of `\(R^2\)` which means that there is no value for the null model. -- ```r r2_pharma <- (sse_null - sse_pharma)/sse_null r2_therapy <- (sse_null - sse_therapy)/sse_null r2_additive <- (sse_null - sse_additive)/sse_null r2_full <- (sse_null - sse_full)/sse_null ``` --- ## Calculating the BIC - Using the sample size and the MSE of each model we can calculate the BIC. The only thing that we need to know is how many unknown values (not fixed!) we need to calculate the predictions of each model. -- - Null model: we only need to obtain the estimate `\(\mu\)` (k = 1). - Main effects model: we need an estimate of `\(\mu\)` and an estimate of `\(\alpha_1\)` (or one estimate of `\(\beta_1\)`, thus k = 2). - Additive model: we need to estimate `\(\mu\)`, `\(\alpha_1\)` and `\(\beta_1\)` (k = 3). - Full model: we need to estimate `\(\mu_{11}\)`, `\(\mu_{12}\)`, `\(\mu_{21}\)` and `\(\mu_{22}\)` (k = 4). -- ```r bic_null <- n_total * log(mse_null) + 1 * log(n_total) bic_pharma <- n_total * log(mse_pharma) + 2 * log(n_total) bic_therapy <- n_total * log(mse_therapy) + 2 * log(n_total) bic_additive <- n_total * log(mse_additive) + 3 * log(n_total) bic_full <- n_total * log(mse_full) + 4 * log(n_total) ``` --- ## Model comparison: Summary - So now we can summarize the results of our models using a table: -- | Model | Parameters | MSE | `\(R^2\)` | BIC | |-------|:----------:|:---:|:-----:|:---:| | Null | 1 | 4 | NA | 143.18 | | ME Pharma | 2 | 3.75 | 0.06 | 141.46 | | ME Therapy | 2 | 3.44 | 0.14 | 132.81 | | Additive | 3 | 3.2 | 0.2 | 130.03 | | Full | 4 | 2.96 | 0.26 | 127.09 | -- - Given that the lowest BIC value corresponds to the full model we can conclude that there is an interaction between pharmaceutical treatment and therapy on the change between pre/post treatment depression scores. --- ## Interpreting the results of the model comparison - First we can say that: The model that accounted for the data better was the full model, this result suggests that there is an interaction between pharmacological treatment and therapy. -- - The predicted values for the full model were:
- The values suggest that the effect of a pharmacological treatment on the change between pre/post treatment depression scores was larger when participants where also exposed to therapy. This means that the effects of one factor are modulated be the level of the other. --- ## Interpreting the results of the model comparison - Finally, the value of `\(R^2\)` from the full model shows that: The full model accounted for 26% of the total variability in our observations. --- - Homework link: ```r link <- "https://raw.githubusercontent.com/ManuelVU/psych-10c-data/main/homework4.csv" ```